This document presents an exploratory data analysis of critical elements concentration data, which provide information about the concentration of critical mineral across different areas in Australia that has been normalised using PAAS (Post-Archean Australian Shale) standard. The primary objective of this analysis is to gain a deeper understanding of the data’s structure and key characteristics. Through this, we aim to identify significant trends, correlations, and outliers that may influence the outcomes of the study.
Our data comprises of 11032 observations and 8 variables. Most of the variables are character type, except for variable Element_Value_ppm, PAAS_value_ppm, and PAAS_normalised_value that are numeric type.
## Classes 'data.table' and 'data.frame': 11032 obs. of 8 variables:
## $ Project_Name : chr "Collingwood Park" "Confidential_B" "Confidential_B" "Confidential_B" ...
## $ Sample_ID : chr "CP-014" "ICP23000472Z291" "ICP23000472Z292" "ICP23000472Z293" ...
## $ Element_Symbol : chr "Ag" "Ag" "Ag" "Ag" ...
## $ Element_Value_ppm : num 0.13 0.14 0.11 0.11 0.11 0.11 0.11 0.11 0.15 0.5 ...
## $ Element_Description : chr "Silver" "Silver" "Silver" "Silver" ...
## $ PAAS_value_ppm : num 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 0.05 ...
## $ PAAS_normalised_value: num 2.6 2.8 2.2 2.2 2.2 2.2 2.2 2.2 3 10 ...
## $ Above_PASS_flag : chr "Enriched above background" "Enriched above background" "Enriched above background" "Enriched above background" ...
## - attr(*, ".internal.selfref")=<externalptr>
The Table 1.1 provide a preview of the data from the first ten observations.
| Project Name | Sample ID | Element Symbol | Element Value ppm | Element Description | PAAS value ppm | PAAS normalised value | Above PASS flag |
|---|---|---|---|---|---|---|---|
| Collingwood Park | CP-014 | Ag | 0.13 | Silver | 0.05 | 2.6 | Enriched above background |
| Confidential_B | ICP23000472Z291 | Ag | 0.14 | Silver | 0.05 | 2.8 | Enriched above background |
| Confidential_B | ICP23000472Z292 | Ag | 0.11 | Silver | 0.05 | 2.2 | Enriched above background |
| Confidential_B | ICP23000472Z293 | Ag | 0.11 | Silver | 0.05 | 2.2 | Enriched above background |
| Confidential_B | ICP23000472Z294 | Ag | 0.11 | Silver | 0.05 | 2.2 | Enriched above background |
| Confidential_B | ICP23000472Z299 | Ag | 0.11 | Silver | 0.05 | 2.2 | Enriched above background |
| Confidential_B | ICP23000472Z300 | Ag | 0.11 | Silver | 0.05 | 2.2 | Enriched above background |
| Confidential_B | ICP23000472Z301 | Ag | 0.11 | Silver | 0.05 | 2.2 | Enriched above background |
| Confidential_B | ICP23000472Z302 | Ag | 0.15 | Silver | 0.05 | 3.0 | Enriched above background |
| Confidential_C | IP23005174R1046 | Ag | 0.50 | Silver | 0.05 | 10.0 | Enriched above background |
| Element Symbol | Element Description | min | max | mean | median | range | q1 | q3 | iqr | sd | var | skewness | kurtosis |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Ag | Silver | 0.100 | 0.660 | 0.207 | 0.180 | 0.560 | 0.110 | 0.272 | 0.163 | 0.119 | 0.014 | 1.647 | 5.783 |
| Al | Aluminium | 7.210 | 680,000.000 | 67,579.252 | 35,000.000 | 679,992.790 | 17,000.000 | 103,000.000 | 86,000.000 | 78,592.423 | 6,176,769,029.641 | 4.403 | 34.192 |
| Au | Gold | 0.001 | 0.022 | 0.006 | 0.006 | 0.021 | 0.004 | 0.008 | 0.004 | 0.004 | 0.000 | 1.806 | 7.764 |
| Ba | Barium | 7.000 | 10,000.000 | 609.458 | 412.500 | 9,993.000 | 189.750 | 653.250 | 463.500 | 1,114.878 | 1,242,952.531 | 5.997 | 43.446 |
| Be | Beryllium | 0.100 | 18.000 | 3.377 | 2.000 | 17.900 | 1.000 | 4.000 | 3.000 | 3.401 | 11.564 | 1.981 | 7.156 |
| Bi | Bismuth | 0.100 | 3.200 | 0.468 | 0.305 | 3.100 | 0.200 | 0.600 | 0.400 | 0.419 | 0.176 | 3.012 | 17.818 |
| Cd | Cadmium | 0.010 | 0.720 | 0.125 | 0.090 | 0.710 | 0.050 | 0.185 | 0.135 | 0.109 | 0.012 | 2.075 | 9.839 |
| Ce | Cerium | 3.600 | 380.000 | 61.211 | 63.050 | 376.400 | 31.125 | 81.375 | 50.250 | 38.062 | 1,448.728 | 2.362 | 21.060 |
| Co | Cobalt | 2.000 | 134.000 | 15.400 | 10.000 | 132.000 | 6.000 | 15.250 | 9.250 | 19.268 | 371.261 | 3.617 | 17.661 |
| Cr | Chromium | 1.000 | 897.000 | 36.565 | 17.000 | 896.000 | 11.000 | 29.000 | 18.000 | 86.951 | 7,560.393 | 6.622 | 54.642 |
| Cs | Caesium | 0.110 | 31.600 | 4.915 | 4.245 | 31.490 | 2.445 | 6.357 | 3.912 | 4.188 | 17.540 | 2.460 | 12.858 |
| Cu | Copper | 1.000 | 255.000 | 42.025 | 47.000 | 254.000 | 16.000 | 60.750 | 44.750 | 28.991 | 840.485 | 1.617 | 13.140 |
| Dy | Dysprosium | 0.400 | 18.500 | 5.325 | 5.310 | 18.100 | 3.500 | 6.715 | 3.215 | 2.965 | 8.790 | 1.115 | 5.566 |
| Er | Erbium | 0.200 | 11.400 | 3.222 | 3.035 | 11.200 | 2.000 | 4.072 | 2.072 | 1.875 | 3.517 | 1.300 | 6.383 |
| Eu | Europium | 0.100 | 6.840 | 1.474 | 1.450 | 6.740 | 0.800 | 1.920 | 1.120 | 0.852 | 0.727 | 1.295 | 8.535 |
| Fe | Iron | 0.190 | 339,126.000 | 16,595.088 | 6,301.000 | 339,125.810 | 2,000.000 | 15,000.000 | 13,000.000 | 37,320.398 | 1,392,812,116.408 | 5.788 | 45.053 |
| Ga | Gallium | 1.300 | 52.300 | 21.981 | 21.950 | 51.000 | 12.900 | 31.625 | 18.725 | 11.620 | 135.019 | -0.020 | 2.148 |
| Gd | Gadolinium | 0.500 | 25.100 | 6.038 | 5.750 | 24.600 | 3.600 | 7.555 | 3.955 | 3.424 | 11.725 | 1.380 | 7.420 |
| Ge | Germanium | 0.140 | 70.000 | 11.970 | 0.550 | 69.860 | 0.292 | 15.200 | 14.907 | 20.849 | 434.698 | 1.458 | 3.653 |
| HREE | Ho+Er+Tm+Yb+Lu | 1.100 | 30.600 | 8.671 | 8.200 | 29.500 | 5.462 | 10.753 | 5.290 | 4.812 | 23.153 | 1.374 | 6.712 |
| Ho | Holmium | 0.100 | 3.800 | 1.081 | 1.045 | 3.700 | 0.700 | 1.355 | 0.655 | 0.613 | 0.376 | 1.187 | 5.962 |
| In | Indium | 0.020 | 0.360 | 0.072 | 0.050 | 0.340 | 0.030 | 0.100 | 0.070 | 0.056 | 0.003 | 2.167 | 9.581 |
| LREE | La+Ce+Pr+Nd+Pm+Sm | 11.700 | 554.500 | 134.275 | 140.060 | 542.800 | 81.100 | 180.300 | 99.200 | 71.054 | 5,048.606 | 0.837 | 6.992 |
| La | Lanthanum | 3.000 | 76.200 | 26.129 | 27.900 | 73.200 | 12.000 | 36.000 | 24.000 | 14.409 | 207.612 | 0.155 | 2.566 |
| Li | Lithium | 5.000 | 285.000 | 47.724 | 40.000 | 280.000 | 15.000 | 64.500 | 49.500 | 39.486 | 1,559.133 | 1.860 | 9.533 |
| Lu | Lutetium | 0.000 | 2.000 | 0.493 | 0.460 | 2.000 | 0.300 | 0.610 | 0.310 | 0.294 | 0.087 | 1.420 | 7.067 |
| MREE | Eu+Gd+Tb+Dy+Y | 2.700 | 135.100 | 41.948 | 41.410 | 132.400 | 24.700 | 52.330 | 27.630 | 23.288 | 542.320 | 1.200 | 5.952 |
| Mn | Manganese | 1.000 | 11,230.000 | 356.239 | 65.500 | 11,229.000 | 19.000 | 220.500 | 201.500 | 1,130.209 | 1,277,371.401 | 7.302 | 66.528 |
| Mo | Molybdenum | 0.100 | 20.600 | 4.456 | 4.000 | 20.500 | 2.000 | 5.000 | 3.000 | 3.292 | 10.836 | 2.227 | 10.237 |
| Nb | Niobium | 0.500 | 42.800 | 7.466 | 7.765 | 42.300 | 4.367 | 9.418 | 5.050 | 4.824 | 23.273 | 2.399 | 16.601 |
| Nd | Neodynium | 1.700 | 115.000 | 29.777 | 31.750 | 113.300 | 14.750 | 39.525 | 24.775 | 16.758 | 280.838 | 0.622 | 4.693 |
| Ni | Nickel | 1.000 | 360.000 | 16.972 | 7.000 | 359.000 | 5.000 | 13.000 | 8.000 | 38.253 | 1,463.294 | 5.983 | 44.099 |
| Pb | Lead | 0.890 | 83.450 | 21.688 | 21.000 | 82.560 | 11.060 | 28.000 | 16.940 | 14.544 | 211.517 | 1.344 | 6.139 |
| Pr | Praseodymi | 0.400 | 33.400 | 7.335 | 7.750 | 33.000 | 3.500 | 9.848 | 6.348 | 4.235 | 17.935 | 0.953 | 7.452 |
| REE | La+Ce+Pr+Nd+Sm+Eu+Gd+Tb+Dy+Ho+Er+Tm+Yb+Lu | 19.600 | 611.000 | 159.328 | 165.430 | 591.400 | 107.100 | 205.510 | 98.410 | 78.891 | 6,223.739 | 0.832 | 6.574 |
| REEY | La+Ce+Pr+Nd+Sm+Eu+Gd+Tb+Dy+Ho+Er+Tm+Yb+Lu+Y | 23.200 | 613.000 | 188.576 | 193.850 | 589.800 | 124.100 | 237.130 | 113.030 | 88.957 | 7,913.437 | 0.613 | 4.600 |
| Rb | Rubidium | 0.160 | 299.000 | 52.910 | 49.400 | 298.840 | 14.300 | 77.825 | 63.525 | 44.400 | 1,971.391 | 1.185 | 5.920 |
| Re | Rhenium | 0.000 | 0.003 | 0.002 | 0.002 | 0.003 | 0.001 | 0.002 | 0.002 | 0.001 | 0.000 | -0.526 | 1.792 |
| Sc | Scandium | 2.200 | 67.800 | 15.465 | 16.200 | 65.600 | 9.325 | 19.675 | 10.350 | 8.300 | 68.898 | 1.108 | 8.482 |
| Sm | Samarium | 0.400 | 21.100 | 6.394 | 6.570 | 20.700 | 3.450 | 8.485 | 5.035 | 3.532 | 12.473 | 0.624 | 4.173 |
| Sn | Tin | 1.000 | 13.000 | 3.991 | 3.600 | 12.000 | 2.700 | 4.800 | 2.100 | 1.925 | 3.705 | 1.671 | 6.862 |
| Sr | Strontium | 2.000 | 1,600.000 | 334.182 | 320.500 | 1,598.000 | 170.000 | 467.750 | 297.750 | 212.895 | 45,324.213 | 1.012 | 6.965 |
| Ta | Thallium | 0.100 | 3.000 | 0.649 | 0.690 | 2.900 | 0.438 | 0.800 | 0.363 | 0.361 | 0.131 | 2.084 | 14.453 |
| Tb | Terbium | 0.100 | 2.930 | 0.882 | 0.870 | 2.830 | 0.600 | 1.100 | 0.500 | 0.486 | 0.236 | 1.103 | 5.576 |
| Th | Thorium | 0.470 | 57.020 | 12.542 | 11.850 | 56.550 | 5.640 | 16.200 | 10.560 | 9.130 | 83.359 | 1.437 | 6.382 |
| Tl | Tantalum | 0.030 | 10.000 | 1.942 | 0.715 | 9.970 | 0.362 | 1.510 | 1.147 | 3.123 | 9.753 | 2.131 | 5.782 |
| Tm | Thulium | 0.100 | 1.800 | 0.479 | 0.450 | 1.700 | 0.300 | 0.600 | 0.300 | 0.269 | 0.073 | 1.551 | 7.679 |
| U | Uranium | 0.150 | 12.000 | 3.610 | 3.650 | 11.850 | 1.625 | 5.018 | 3.393 | 2.255 | 5.083 | 0.541 | 3.219 |
| V | Vanadium | 2.000 | 460.000 | 111.060 | 118.000 | 458.000 | 50.000 | 150.000 | 100.000 | 71.213 | 5,071.349 | 0.961 | 5.959 |
| Y | Yttrium | 1.000 | 100.500 | 28.110 | 27.500 | 99.500 | 17.000 | 35.900 | 18.900 | 16.811 | 282.623 | 1.287 | 6.443 |
| Yb | Ytterbium | 0.200 | 11.900 | 3.217 | 3.080 | 11.700 | 1.975 | 4.102 | 2.128 | 1.879 | 3.531 | 1.259 | 6.361 |
| Zn | Zinc | 1.000 | 307.000 | 65.378 | 66.000 | 306.000 | 17.000 | 101.000 | 84.000 | 49.645 | 2,464.655 | 0.745 | 4.134 |
| Zr | Zirconium | 4.000 | 916.000 | 175.363 | 186.000 | 912.000 | 93.750 | 228.000 | 134.250 | 112.505 | 12,657.471 | 1.419 | 9.671 |
The most important statistics from this table is kurtosis. Kurtosis measures the combined weight of the tails of a distribution relative to its centre. In this way, we can use kurtosis as an indicator of the presence of outliers. A high kurtosis values is indicative of outliers. Validating the outliers will be easier with data visualisation, which will be presented in the next section.
The table below represents the descriptive statistics of elements from ME-4ACD81 test that will be used for predictive modelling.
Next, we are going to analyse the distribution of each critical elements based on their PAAS normalised values. The Figure 1.1 provides insights regarding the spread, central tendency, and potential outliers for each critical element.
Figure 1.1: Distribution of Critical Elements (Box-Plot)
As depicted above, there are some key points that we would like to raise, which are:
| Element Symbol | median |
|---|---|
| Re | 5.000 |
| Ag | 3.600 |
| Au | 3.333 |
| Mo | 2.667 |
| Bi | 2.402 |
| Li | 2.000 |
| Cu | 1.880 |
| Eu | 1.648 |
| Dy | 1.517 |
| Gd | 1.513 |
| Sm | 1.460 |
| Lu | 1.438 |
| Yb | 1.400 |
| Tm | 1.364 |
| Tb | 1.359 |
| MREE | 1.344 |
| Er | 1.320 |
| Ho | 1.306 |
| U | 1.304 |
| Ga | 1.291 |
| Y | 1.250 |
| Pb | 1.235 |
| Nd | 1.221 |
| Sc | 1.191 |
| REEY | 1.151 |
| REE | 1.130 |
| Th | 1.107 |
| V | 1.103 |
| Pr | 1.092 |
| LREE | 1.064 |
Figure 1.2: Distribution of Selected Critical Elements (Box-Plot)
In this analysis, we are trying to assess all critical elements towards the PAAS standard. To begin with, we start from a high-level distribution across the two main categories that we use in identifying which elements that fall under above/below standard categories. The normalised value will be flagged as “Enriched above background” if it is above 1, while the rest will be flagged as “Below background”. The Figure 1.3 provide the details about this high-level distribution.
Figure 1.3: The Profile of PASS Categories
As depicted in the bar chart, the distribution appears fairly balanced between the two categories, with 5,724 instances classified as “Enriched Above Background” and 5,355 instances classified as “Below Background”. Such a balance highlights the importance of further detailed analysis to understand the factors contributing to this distribution, the significance of enrichment in the context of the dataset, and how these elements behave under different conditions.
Moving on to the element’s level, we will assess how each critical element’s PAAS level, whether they are above/below background value. The Figure 1.4 shows the profile of each sample towards this standard and their respective flags.
Figure 1.4: Distribution of elements with reference to PASS levels’ Concentration
As can be seen, the majority of normalised value fall within 0 to 10. Some highlight points from this plot are:
Figure 1.5: The Profile of Project Area with reference to PASS levels’ Concentration
Some key observations from the plot are:
Lastly, let’s see how is the distribution of each critical element on every project area. The Figure 1.6 below provide this information. As the continuation of the previous analysis, in this part we will focus more on the concentration of each element in each project area. For instance, such insights that we are going to look for are elements that have wide range of concentrations in one project, elements that have above/below background in the same project area.
Figure 1.6: Distribution of Critical Elements by Each Project Area
As shown by above graphs, some significant insights are:
Figure 1.7: Correlation Matrix Plot of Critical Elements
The figure 1.7 shows the relationships between various elements. Some key observations are:
Figure 1.8: Correlation Matrix Plot of Selected Critical Elements
In this last section, we are going to dig deeper into critical elements that have correlation value above 0.95 (as mentioned in the previous points). For context, Er, Dy, Gd, Eu, Ho, Pr, Nd, Sm, Tb, and Yb are known as lanthanides series. Lanthanides are a group of the first 15 f-block elements with atomic numbers from 57 to 71. In addition to yttrium, which share many similar chemical properties with the lanthanides, these elements comprise the rare earth elements (REEs) (Mattocks et al., 2021). Their high correlation values are most likely affected due to this fact.
Furthermore, Seredin (2012) suggest that the first suggestions for recovering lanthanides and yttrium (REY) as by-products from coal deposits can be traced back to 20 years ago, following the discovery of coal beds in a Russian Far East (RFE) basin that had high REY content (0.2% - 0.3%). Additional coal seams with comparable or even higher REY concentrations (up to 1.0% in ash) were identified in six coal-bearing basins across the same region. Since then, REY-rich coal has also been discovered in coal basins in various other countries. Thus, the lanthanides are found in coal and coal by-products could be possible because of their association with the materials that make up coal.
Correlated element (Above 0.95)Figure 1.9: Strongly Correlated Critical Elements
The Figure 1.9 above depicted show correlations between these elements, additional colour dimension was added to differentiate the project areas. In summary, there are many correlations that closely align to each other, as shown by the tight spread of the dot points around the straight line, which are exhibited by Pr & Nd, Gd & Eu, Ho & Er, Tb & Dy, Yb & Er, Sm & Nd, and Sm & Pr plots. In some of the plots, there are a few data points that deviate from the majority of the points. These could be potential outliers or cases where the relationship slightly diverges, which are exhibited by Er & Dy, Ho & Dy, Yb & Dy, Yb & Ho, Tb & Ho, Tb & Gd plots. For Sm & Nd, and Sm & Pr plots even though they are belong to the previous group, but when we look closely from the project area point of view, we can see that the grey dots (which represents ‘Wandoan’) are deviating away from the linear line. This indicates the correlation between them are not strong. Let’s dig deeper into these elements by utilising plots below.
Figure 1.10: Correlated Critical Elements (Sm, Pr, Nd) in Wandoan
As can be seen in the correlation matrix of figure 1.10, in Wandoan, the correlation between Sm and Nd is 0.95, just right at the minimum threshold that we chose. However, Sm and Pr correlation is below 0.95, which contributes to the fact that the points deviate away from the linear line. Considering their correlation value is below our threshold, it’s most likely that Sm and Pr correlation from this project is excluded for the predictive modelling part.